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Abstract 

The  stability  analysis  for  the  structure  from  motion  problem  presented  in  this  paper 
investigates  the  optimal  relationship  between  the  errors  in  the  estimated  translational 
and  rotational  parameters  of  a  rigid  motion  that  results  in  the  estimation  of  a  minimum 
number  of  negative  depth  values.  No  particular  estimators  are  used  and  no  specific 
assumptions  about  the  scene  are  made.  The  input  used  is  the  value  of  the  flow  along 
some  direction,  which  is  more  general  than  optic  flow  or  correspondence.  For  a  planar 
retina  it  is  shown  that  the  optimal  configuration  is  achieved  when  the  projections  of  the 
translational  and  rotational  errors  on  the  image  plane  are  perpendicular.  For  a  spherical 
retina,  given  a  rotational  error,  the  optimal  translation  is  the  correct  one,  while  given  a 
translational  error  the  optimal  rotational  error  is  normal  to  the  translational  one  at  an 
equal  distance  from  the  real  and  estimated  translations.  The  proofs,  besides  illuminating 
the  confounding  of  translation  and  rotation  in  structure  from  motion,  have  an  important 
application  to  ecological  optics.  The  same  analysis  provides  a  computational  explanation 
of  why  it  is  much  easier  to  estimate  self-motion  in  the  case  of  a  spherical  retina  and  why 
it  is  much  easier  to  estimate  shape  in  the  case  of  a  planar  retina,  thus  suggesting  that 
nature’s  design  of  compound  eyes  (or  panoramic  vision)  for  flying  systems  and  camera- 
type  eyes  for  primates  (and  other  systems  that  perform  manipulation)  is  optimal. 


The  support  of  the  Office  of  Naval  Research  under  Grant  N00014-96- 1-0587  is  gratefully  acknowl¬ 
edged. 


1  Introduction 


The  general  problem  of  structure  from  motion  is  defined  as  follows:  given  a  number  of 
views  of  a  scene,  to  recover  the  rigid  transformations  between  the  views  and  the  structure 
(shape)  of  the  scene  in  view.  In  the  field  of  computational  vision  a  lot  of  effort  has 
been  devoted  to  this  problem  because  it  lies  at  the  heart  of  several  applications  in  pose 
estimation,  recognition,  calibration,  and  navigation  [8,  17].  For  reasons  related  to  the 
tractability  of  the  exposition  and  without  loss  of  generality,  we  consider  here  the  case  of 
differential  motion  for  a  camera  moving  in  a  static  environment  with  the  goal  of  recovering 
the  camera’s  3D  rigid  motion  and  the  structure  of  the  scene  [42,  4,  30].  The  problem  has 
been  traditionally  treated  in  a  two-step  approach.  The  first  step  attempts  to  establish  the 
correspondence  between  successive  image  frames,  i.e.,  to  identify  in  consecutive  images 
features  that  are  the  projections  of  the  same  feature  in  the  3D  scene.  Such  correspondence 
is  expressed  through  displacement  vectors  or  optic  flow — an  approximation  of  the  motion 
field  which  represents  the  projection  of  the  velocity  field  of  scene  points  on  the  image. 
The  second  step  attempts  to  interpret  this  correspondence  or  flow  field  and  recover  3D 
motion  and  structure. 

During  the  Eighties,  questions  related  to  the  uniqueness  of  solutions  were  answered  for 
both  the  discrete  case  of  point  matches  [24, 41]  and  the  differential  case  [25, 44].  This  work 
gave  rise  to  closed  form  solutions  and  opened  avenues  into  the  study  of  uniqueness  issues. 
Similar  problems  were  solved  in  the  photogrammetric  literature  [35].  The  algorithms 
developed  during  this  phase  of  research  were  based  on  two  frames  (or  views)  and  the  use 
of  point  features.  Algorithms  for  the  case  of  three  (or  multiple)  frames  were  introduced 
in  [40]  with  the  formulation  of  the  trilinear  constraints  and  were  generalized  in  [10] 
using  geometric  algebra.  At  the  same  time,  algorithms  appeared  that  made  use  of  line 
correspondences  [39],  as  well  as  algorithms  that  used  both  point  and  line  correspondences, 
thanks  to  the  trilinear  constraints  [40].  Also,  these  results  were  generalized  to  the  case  of 
uncalibrated  cameras,  a  situation  in  which  only  projective  (or,  under  some  assumptions, 
afline)  structure  can  be  recovered  [9,  15,  28,  21]. 

The  promise  of  the  uniqueness  studies  gave  rise  to  an  exciting  quest  for  practical  and 
robust  algorithms  for  recovering  3D  structure  and  motion  from  image  sequences,  but  this 
was  soon  to  be  frustrated  by  high  sensitivity  to  noise  in  the  input  used  (optic  flow  or 
correspondence).  While  many  solutions  have  been  proposed,  they  become  problematic 
in  the  case  of  realistic  scenes  and  most  of  them  degrade  ungracefully  as  the  quality  of 
the  input  deteriorates.  This  has  motivated  research  on  the  stability  of  the  problem;  [7] 
contains  an  excellent  survey  of  existing  error  analyses.  We  will  discuss  the  most  important 
results  in  Section  3  in  more  technical  detail  after  some  mathematical  prerequisites  are 
given  in  Section  2.  In  summary,  it  can  be  concluded  that  the  majority  of  the  existing 
analyses  attempt  to  model  the  errors  in  either  the  3D  motion  estimates  or  the  depth 
estimates,  and  due  to  the  large  number  of  unknowns  in  the  problem,  they  deal  with 
restricted  conditions  such  as  planarity  of  the  scene  in  view  or  non-biasedness  of  the 
estimators.  Notably  absent  in  published  efforts  is  an  account  of  the  systematic  nature 
of  the  errors  in  the  depth  estimates  due  to  errors  in  the  3D  motion  estimates.  Put  in 
different  terms,  there  exists  an  interplay  between  3D  motion  and  depth.  In  existing 
approaches,  however,  the  highly  correlated  nature  of  the  depth  errors  at  different  image 
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locations,  due  to  3D  motion  errors,  is  not  reflected  adequately.  Furthermore,  all  analyses 
are  based  on  the  two-step  approach,  analyzing  the  estimation  of  3D  motion  from  noise- 
contaminated  optic  flow  or  correspondence.  However,  as  has  been  shown  in  previous  work, 
the  estimation  of  3D  motion  does  not  necessarily  require  the  prior  computation  of  exact 
correspondence  [11,  12,  13,  20,  29].  Flow  measurements,  or  even  their  signs,  along  some 
direction  in  the  image,  such  as — for  example — the  one  provided  by  the  spatial  gradient, 
are  sufficient  for  recovering  3D  motion  [3].  Such  measurements  can  be  computed  by  even 
the  simplest  systems — biological  or  artificial — using,  for  example,  Reichardt  detectors  or 
equivalent  energy  models  [32,  33,  31,  43]. 

In  this  paper  an  approach  that  is  independent  of  any  algorithm  or  estimator  is  taken. 
Due  to  the  geometry  of  image  formation  any  spatiotemporal  representation  in  the  image 
is  due  to  the  3D  motion  and  the  structure  of  the  scene  in  view.  If  the  3D  motion  can  be 
estimated  correctly,  the  structure  can  be  derived  correctly  using  the  equations  of  image 
formation.  However,  an  error  in  the  estimation  of  the  3D  motion  will  result  in  the  compu¬ 
tation  of  a  distorted  version  of  the  actual  scene  structure.  Of  computational  interest  are 
those  regions  in  space  where  the  distortions  are  such  that  the  depths  become  negative. 
Not  considering  any  scene  interpretation  the  only  fact  we  know  about  the  scene  is  that 
for  it  to  be  visible  it  has  to  lie  in  front  of  the  image  and  thus  the  corresponding  depth 
estimates  have  to  be  positive.  Therefore  the  number  of  image  points  whose  corresponding 
scene  points  would  yield  negative  values  due  to  erroneous  3D  motion  estimation  should 
be  kept  as  small  as  possible.  This  is  the  computational  principle  behind  the  error  analy¬ 
sis  presented  in  this  paper.  In  paxticular,  the  following  questions  are  studied.  Assuming 
there  is  an  error  in  the  estimation  of  the  rotational  motion  components,  what  is  the  error 
in  the  translational  components  that  leads  to  a  minimization  of  the  negative  depth  values 
computed?  Similarly,  if  there  is  an  error  in  the  translational  motion  estimates,  which 
rotational  error  will  result  in  the  smallest  number  of  negative  depth  values?  The  analysis 
is  carried  out  for  a  complete  field  of  view  as  perceived  by  an  imaging  sphere,  and  for  a 
restricted  field  of  view  on  a  constrained  image  plane. 


2  Overview  and  Problem  Statement 


2.1  Prerequisites 

We  consider  an  observer  moving  rigidly  with  translation  t  =  (D,  V]  W)  and  rotation 
u?  =  (n,  7)  in  a  stationary  environment.  Thus  each  scene  point  R  =  {X,  Y,  Z)  measured 

with  respect  to  a  coordinate  system  OXYZ  fixed  to  the  camera’s  nodal  point  O  has  a 
velocity  R  =  — t  —  w  x  R  relative  to  the  camera.  The  image  formation  is  based  on 
perspective  projection. 

If  the  image  is  formed  on  a  plane  orthogonal  to  the  Z  axis  at  distance  /  from  the 
nodal  point  (see  Figure  1)  the  image  points  r  =  (x,  y,  f)  are  related  to  the  scene  points 
R  through  equation 

/ 


r  = 


R  •  zo 


R 


with  Zo  a  unit  vector  in  the  direction  of  the  Z  axis  and  denoting  the  inner  product. 
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Thus,  the  2D  image  velocity  amounts  to 

r  =  +  Vrot(r)  =  -^(zo  X  (t  X  r))  +  j{zo  x  (r  x  (a;  x  r)))  (1) 

where  and  Vrot(r)  are  the  translational  and  rotational  flow  components  respectively 
and  Z  =  R-  Zq. 


Figure  1:  Image  formation  under  perspective  projection  on  a  planar  retina:  The  instan¬ 
taneous  rigid  motion  is  described  through  a  translation  t  =  {U,V,W)  and  a  rotation 
u?  =  (a,;0,7).  The  focus  of  expansion  (FOE),  given  by  (^/,  ^/),  denotes  the  direction 
of  translation,  and  the  AOR  (axis  of  rotation  point),  given  by  (^/,  ^/),  denotes  the 
intersection  of  the  rotation  axis  and  the  image. 


Similarly,  if  the  image  is  formed  on  a  sphere  of  radius  /  (i.e.,  r-r  =  /^)  (see  Figure  2), 
the  image  r  =  {x,  y,  z)  of  any  point  R  is 


r  = 


|R| 


with  |R|  being  the  norm  of  R  and  denoting  the  range;  thus  the  2D  image  motion  is 

r  =  +  v„,(r)  =  (r  X  (r  X  t))  -  w  X  r  (2) 


|R| 


|R|/ 

3 


(3) 


The  component  of  the  flow  Un  along  any  direction  n  is  therefore 


Ur. 


r  •  n  = 


Vtr 


n  +  Vrot  •  n  or  u. 


Vtr 

=  r  •  n  =  -—  •  n  +  Vrot  •  n 

|K| 


As  can  be  seen  from  equations  (1)  and  (2),  the  effects  of  translation  and  scene  structure 
cannot  be  disentangled  and  thus  we  can  only  obtain  the  direction  of  translation  t/|t|  and 
the  depth  (range)  of  the  scene  up  to  a  scaling  factor,  that  is  (^)- 
simplicity,  we  will  assume  t  to  be  of  length  1  and  we  will  no  longer  mention  the  scaling 
in  the  computation  of  structure. 


Figure  2:  Image  formation  under  perspective  projection  on  a  spherical  retina. 


2.2  Previous  work 

It  is  in  general  a  very  hard  task  to  develop  analytical  results  about  the  stability  or  error 
sensitivity  of  structure  from  motion.  This  is  due  to  the  nonlinearities  and  the  large 
number  of  parameters  that  are  involved.  As  a  result  a  fair  number  of  observations  and 
intuitive  arguments  have  been  developed  by  a  multitude  of  authors  over  the  years.  Most 
important,  a  small  number  of  studies  have  given  rise  to  three  crisp  results  regarding  noise 
sensitivity  in  structure  from  motion  [7].  These  are: 

(a)  A  translation  can  be  easily  confounded  with  a  rotation  in  the  case  of  a  small  field 
of  view  under  the  assumption  of  lateral  motion  and  insufficient  variation  of  depth 
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[1,6].  Intuitively,  translation  along  the  x  axis  can  be  confused  with  rotation  around 
the  y  axis  and  translation  along  the  y  axis  with  rotation  around  the  x  axis.  Evidence 
for  this  result  can  be  obtained  intuitively  from  the  flow  equation  (1).  As  can  be 
seen,  if  the  scene  in  view  is  a  plane,  then  the  flow  becomes  a  polynomial  in  the 
retinal  coordinates  x,  y  with  the  terms  ti  +  u;2,  t2  —  representing  the  zero-order 
terms.  An  elegant  proof  of  this  fact  using  techniques  from  estimation  theory  has 
been  presented  by  Daniilidis  [6]  for  the  case  of  unbiased  estimators. 

(b)  Usually  an  error  metric  is  developed  whose  minimization  provides  a  solution  for  3D 
motion  and  subsequently  for  structure.  If  this  metric  is  not  appropriately  normal¬ 
ized,  in  the  case  of  a  small  field  of  view  the  translation  estimate  is  biased  toward 
the  viewing  direction.  This  can  be  seen  directly  from  the  epipolar  constraint.  In  its 
instantaneous  form  the  epipolar  constraint  becomes  (t  x  r)  •  (r— a?  x  r)  =  0  assuming 
/  =  1.  In  the  discrete  case,  if  ri  and  r2  are  corresponding  image  points  before  and 
after  the  motion  the  constraint  is  r2  •  (t  x  Rr\)  =  0,  where  R  represents  the  rotation 
matrix.  A  solution  coming  from  the  minimization  of  l|r2i  •  (t  x  i?ri,)|p  is  bound 
to  be  biased,  because  the  cross  product  introduces  the  sine  of  the  angle  between 
vectors  t  and  Rvu  as  a  factor.  This,  in  turn,  makes  the  minimization  prefer  vectors 
t  that  are  closer  to  the  center  of  gravity  of  the  points  Rv\i  so  that  the  sine  and 
hence  the  residual  is  smaller  [36,  38].  Techniques  from  statistics  such  as  maximum 
likelihood  estimation  [36]  or  Rayleigh  optimization  [37]  can  be  used  to  deal  with 
the  bicis,  but  they  have  their  own  problems. 

(c)  The  third  result  is  due  to  Maybank  [26,  27]  who  showed  that  in  the  case  of  a 
small  field  of  view  and  an  irregular  surface  the  cost  function  resulting  from  the 
epipolar  constraint,  Yli  ||(t  x  r,)  •  (r,-  —  u>  x  ri)|p,  takes  its  minima  along  a  line  in 
the  space  of  translation  directions  which  passes  through  the  true  translation  and 
(not  surprisingly  due  to  the  small  field  of  view  assumption)  the  viewing  direction. 
This  means  that  the  tilt  of  the  direction  of  t  can  be  estimated  more  reliably  than 
its  slant. 

Additional  important  work  has  been  concerned  with  the  study  of  configurations  of  scene 
points  that  give  rise  to  multiple  solutions  from  point  correspondences  [18],  the  so-called 
ambiguity-critical  surfaces.  It  has  been  shown  by  Horn  [19]  that  the  epipolar  constraint 
is  not  affected  by  first-order  deformations  of  the  motion  parameters  if  the  points  lie  on  a 
quadric  surface  with  certain  properties.  The  relationship  between  these  instability-critical 
surfaces  and  the  ambiguity-critical  surfaces  has  been  established  in  [6,  16]. 

Next,  we  study  the  relationship  between  errors  in  the  estimation  of  the  3D  motion 
and  errors  in  the  estimation  of  the  depth  of  the  scene.  This  relationship  is  the  basis  for 
our  subsequent  error  analysis. 

2.3  Distorted  space 

Based  on  an  exact  computation  of  the  motion  parameters  the  depth  (range)  can  be 
derived  from  equation  (3).  Let  us  assume,  however,  there  is  an  error  in  the  estimation 
of  the  five  motion  parameters,  that  is  the  two  parameters  of  the  direction  of  translation 
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and  the  three  parameters  of  rotation.  As  a  consequence  there  will  also  be  errors  in  the 
estimation  of  depth  (range)  and  thus  a  distorted  version  of  the  space  will  be  computed. 
A  convenient  way  to  describe  the  distortion  of  space  is  to  sketch  it  through  surfaces  in 
space  which  are  distorted  by  the  same  multiplicative  factor,  the  iso-distortion  surfaces 
(5,  14]. 

In  the  following,  in  order  to  distinguish  between  the  various  estimates,  we  use  letters 
with  hat  signs  to  represent  the  estimated  quantities  (t,a;,  |R|,  Z,  VtD  Vrot)  and  unmarked 
letters  to  represent  the  actual  quantities  (t,a;,  |R|,Z,Vtr,  Vrot)-  The  subscript  “e”  is  used 
to  denote  errors,  where  we  define  w  —  w  =  Wc  and  Vrot  —  Vrot  =  Vrot^- 

The  estimated  depth  or  range  can  be  obtained  from  equation  (3)  as 


Z  (or  |R|) 


V  tr  •  n 

f  •  n  -  Vrot  •  n 


and  we  have  on  the  image  plane 


Z  =  Z 


Zo  X 

(txr) 

).n  \ 

... 

1  -/  (zo  X  (t  X  r))  • 

n  +  Z(zo  X 

(r  X  (we  X  r)))  •  n  ) 

(4) 


and  on  the  image  sphere 


|R|  =  |R|  ■ 


X  X  t)  j  •  n 


(r  X  (r  xt))-n-f /|R|  (a;^ 


(5) 


A 

From  equation  (4)  it  can  be  seen  that  Z  can  be  expressed  as  a  multiple  of  Z,  where  the 
multiplicative  factor,  which  we  denote  by  Z),  the  distortion  factor,  is  given  by  the  term 
inside  the  brackets.  Thus  the  distortion  factor  is 

p _ -/(aix(txr)).n 

— /  (zo  X  (t  X  r))  •  n  -f  Z  (zo  X  (r  X  (we  x  r)))  •  n 


Similarly  we  can  interpret  the  estimated  range  in  equation  (5)  as  a  multiple  of  the  actual 
range  with  distortion  Z?,  where 


D  = 


(r  X  X  t))  •  n 

(r  X  (r  X  t))  •  n  -h  /|R|  (uj^  x  r)  •  n 


(7) 


Equations  (6)  and  (7)  describe,  for  any  fixed  direction  n  and  any  distortion  factor  Z),  a 
surface  in  space.  Any  such  surface  is  to  be  understood  as  the  locus  of  points  in  space 
which  are  distorted  in  depth  (range)  by  the  same  factor  Z),  if  the  corresponding  image 
measurements  are  in  direction  n. 

It  should  be  emphasized  that  the  distortion  of  depth  also  depends  on  the  direction 
n  of  the  flow  measurement  (hereafter  called  the  flow  direction)  used  as  basis  for  the 
computations  and  therefore  is  different  for  different  directions  of  flow.  This  means  simply 
that  if  one  estimates  depth  from  optical  flow  in  the  presence  of  errors,  the  results  can  be 
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very  different,  depending  on  whether  the  horizontal,  vertical,  or  any  other  component  is 
used.  Depending  on  the  direction,  any  value  between  +oo  and  — oo  can  be  obtained! 

In  the  analysis  in  this  paper,  we  are  not  interested  in  actual  3D  space,  but  we  consider 
the  surfaces  in  visual  space,  that  is,  the  space  perceived  under  perspective  projection 
where  the  dimensions  parallel  to  the  image  are  measured  according  to  the  size  with 
which  they  appear  on  the  image. 

Figure  3a  gives  an  example  of  an  iso-distortion  surface,  and  Figure  3b  illustrates  a 
family  of  iso-distortion  surfaces  corresponding  to  the  same  gradient  direction  but  different 
distortion  factors  D.  The  same  family  is  intersected  with  the  xZ  plane  in  Figure  3c.  In 
the  plane  the  intersections  give  rise  to  a  family  of  contours. 

As  can  be  seen  the  iso-distortion  surfaces  of  a  family  intersect  in  a  curve,  and  they 
change  continuously  as  we  vary  D.  Thus  all  the  space  between  the  0  distortion  surface 
and  the  —  oo  distortion  surface  (which  is  also  the  4-oo  distortion  surface)  is  distorted  by 
a  negative  distortion  factor. 

2.4  Description  of  results 

In  the  forthcoming  sections  we  employ  a  geometric  statistical  model  to  represent  the 
negative  depth  values.  We  assume  that  the  scene  in  view  lies  within  a  certain  depth 
(range)  interval  between  a  minimum  value  and  a  maximum  value.  The  flow  representation 
vectors  in  the  image  axe  in  different  directions,  and  we  assume  some  distribution  for  their 
directions.  Our  focus  is  on  the  points  in  space  which  for  a  3D  motion  estimate  yield 
negative  depth  (range)  estimates. 

For  every  direction  n  the  points  in  space  with  negative  depth  estimates  cover  the 
space  between  the  0  and  — oo  distortion  surface  within  the  range  covered  by  the  scene. 
Thus  for  every  direction  we  obtain  a  3D  subspace,  covering  a  certain  volume.  The  sum  of 
all  volumes  for  all  directions,  normalized  by  the  flow  distributions  considered,  represents 
a  measure  of  the  likelihood  that  negative  depth  values  occur.  We  call  it  the  “negative 
depth  volume”  or  “negative  range  volume.”  The  idea  behind  our  error  analysis  lies  in 
the  minimization  of  this  negative  depth  (range)  volume — that  is,  we  are  interested  in  the 
relationship  between  the  translational  and  rotational  motion  errors  that  minimizes  this 
volume. 

In  our  analysis  we  do  not  want  to  make  any  particular  scene-related  assumptions 
favoring  particular  orientations  or  depth  values.  We  wish  to  treat  all  depth  values  and 
flow  directions  as  having  equal  importance.  To  be  more  precise,  we  assume  that  the  flow 
directions  are  uniformly  distributed  in  every  direction  and  at  every  depth  (range)  between 
a  minimum  value  .^mm(|Rmini)  and  a  maximum  value  ^maxdRmaxI)-  We  do  not  wish  to 
assume  any  particular  distribution  for  the  noise  in  the  flow  measurements.  Therefore, 
we  do  not  consider  any  noise  in  the  measurements.  Thus,  one  can  view  our  analysis  as  a 
geometric  investigation  of  the  inherent  confounding  of  translation  and  rotation,  which  is 
the  reason  behind  the  instability  in  structure  from  motion. 

In  summary,  as  an  answer  to  the  question  about  the  coupling  of  motion  errors,  the 
following  results  are  obtained: 

(a)  If  we  take  the  whole  sphere  as  the  imaging  surface  and  we  assume  an  error  in  the 
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0  =  0.3 


Figure  3:  (a)  Iso-distortion  surface  in  xyZ  space.  The  parameters  are:  (a;o,2/o)  = 

=  (-50,-25),  (xo,t/o)  =  =  (0,-20),  We  =  (ac,^o7e)  = 

(—0.005,0.001,0.003),  D  =  1.5,  n  =  (1,0),  /  =  500  (corresponding  to  a  field  of  view  of 
50®).  (b)  Family  of  iso-distortion  surfaces  for  the  same  motion  parameters  (n  =  (1,0)). 
(c)  Corresponding  iso-distortion  contours  in  the  xZ  plane. 
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estimation  of  rotation,  then  the  direction  of  translation  that  minimizes  the  negative 
depth  volume  is  the  correct  direction  of  translation. 

The  practical  implication  of  this  result  is  that  3D  motion  estimation  is  most  easily 
accomplished  for  a  complete  field  of  view,  as  provided  by  an  imaging  sphere.  A 
working  system  (biological  or  artificial)  is  usually  equipped  with  an  inertial  sensor 
which  provides  rotational  information,  though  probably  with  some  error.  On  the 
basis  of  this  information,  the  best  one  can  do  to  estimate  the  remaining  translation 
is  to  assume  that  the  flow  field  obtained  by  subtracting  the  estimated  rotation 
is  purely  translational  and  apply  a  simple  algorithm  designed  for  only  translation 
[2,  20,  29,  34]. 

Such  algorithms,  if  based  only  on  the  constraint  that  the  depth  is  positive,  are 
formulated  basically  as  constrained  minimization  problems.  The  underlying  idea 
is  illustrated  in  Figure  4.  Assuming  the  observer  is  approaching  the  scene,  the 
exact  2D  motion  vector  at  every  point  is  away  from  the  FOE  (the  point  where  the 
translation  axis  pierces  the  image).  Thus  the  projection  u„  of  the  flow  vector  on 
any  direction  n  is  confined  to  lie  in  a  half-plane,  as  defined  by  line  e  in  Figure  4,  and 
the  FOE  is  to  be  found  in  the  complementary  half-plane.  Thus  the  estimation  of 
the  translational  direction  can  be  implemented  by  simply  voting  for  a  half-plane  at 
every  point.  The  best  solution  corresponds  to  the  location  with  the  highest  number 
of  votes. 


Figure  4:  The  translational  flow  vector  Ut  has  its  tip  anywhere  along  the  line  The 
focus  of  expansion  lies  on  the  (shaded)  half  plane  defined  by  the  line  e  that  does  not 
contain  possible  vectors 

Estimation  of  purely  translational  motion  is  much  simpler  than  estimation  of  com¬ 
plete  3D  rigid  motion,  which  requires  techniques  that  decouple  the  translation  from 
the  rotation  in  some  way,  and  if  designed  on  the  basis  of  the  constraint  of  positive 
depth,  require  voting  in  higher  dimensions  [11,  12,  13]. 

As  demonstrated  in  the  forthcoming  analysis,  however,  a  simple  algorithm  designed 
for  translation  only  will  find  the  correct  solution.  Thus  insects  with  spherical  eyes, 
such  as  bees  and  flies,  have  a  big  advantage  in  the  task  of  3D  motion  estimation. 

(b)  On  the  other  hand,  if  we  assume  a  certain  error  in  the  estimation  of  translation  on 
a  spherical  image,  we  find  that  the  vector  of  the  rotational  error  «£  lies  on  the  same 
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geodesic  as  the  real  translation  t  and  the  estimated  translation  t  at  equal  distance 
from  both,  that  is,  (t  +  t)  x  =  0  (t  and  t  are  unit  vectors). 

(c)  Considering  as  imaging  surface  a  plane  of  limited  extent,  we  find  that  the  trans¬ 

lational  and  rotational  errors  are  perpendicular  to  each  other.  Using  the  notation 
^  ^  =  xo,  and  ^  ^  =  ?/o„  this  means  that  ^  If  we  fix  the 

rotational  error  this  provides  us  with  a  constraint  on  the  direction  of 

the  translational  error. 

(d)  If  we  fix  the  translational  error  {xo^,yo^)  we  obtain  the  same  constraint,  and  in 
addition  we  find  that  7^  =  0. 

The  results  developed  in  this  paper  have  a  clear  relationship  with  those  of  existing 
error  analyses  as  described  in  Section  2.2,  with  the  exception  of  the  bias  of  transla¬ 
tion  towards  the  viewing  direction,  since  this  result  has  been  obtained  on  the  basis 
of  particular  algorithms  and  image  measurement  configurations. 

Regarding  the  confusion  between  translation  and  rotation,  it  has  been  experimen¬ 
tally  observed  and  proven  for  simple  scene  structures,  restricted  fields  of  view, 
and  certain  estimation  techniques  using  particular  statistical  estimators,  that  the 
translation  along  the  x  axis  is  coupled  with  rotation  around  the  y  axis  and  that  the 
translation  along  the  y  axis  is  coupled  with  rotation  around  the  x  axis.  This  can 
be  explained  using  the  constraint  we  have  developed.  If  /3c  changes,  the  constraint 
^  remains  intact  if  xq^  is  changed  appropriately.  Similarly,  an  error  in 

can  be  hidden  in  yo^.  This,  however,  is  not  true  in  general.  An  error  in  /?£  could 
be  coupled  with  an  error  in  yo^  or  in  both  and  yo^-  The  only  condition  that 
must  be  satisfied  is  the  perpendicularity  between  the  translational  and  rotational 
errors;  the  confusions  between  x-translation  and  t/-rotation,  and  ^-translation  and 
x-rotation,  are  not  decoupled. 

Regarding  the  distribution  of  the  global  minima  of  the  objective  function  derived 
from  the  epipolar  constraint,  there  is  an  interesting  connection.  Given  a  partic¬ 
ular  rotational  error,  our  result  shows  perpendicularity  of  the  translational  and 
rotational  errors  irrespective  of  the  scene  in  view.  Thus,  all  possible  estimated 
translations  are  to  be  found  on  a  line  passing  through  the  real  translation  normal 
to  the  rotational  error.  In  the  analysis  of  Maybank,  various  assumptions  are  made 
that  can  be  interpreted  as  introducing  a  rotational  error.  Therefore,  the  line  found 
in  [26,  27]  should  he  perpendicular  to  the  rotational  error  due  to  these  assumptions. 

The  importance  of  the  results  obtained  for  the  plane  also  lies  in  their  consequences 
for  shape  estimation.  They  can  be  translated  into  the  statement  that  planar  retinas 
with  high  resolution  at  the  center  are  advantageous  in  the  computation  of  shape. 
As  will  be  shown  in  Section  5,  if  ^  near  the  fixation  center  for  any  depth 

Z,  the  distortion  factor  is  approximately  the  same  for  every  flow  direction!  This 
means  that  all  scene  points  of  the  same  depth  are  distorted  by  the  same  factor  and 
thus  a  depth  map  is  derived  whose  level  contours  are  the  correct  ones! 
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3  Analysis  on  the  Sphere 
3.1  Fixed  rotational  error 

We  need  a  parameterization  for  expressing  all  possible  orientations  n  tangent  to  the 
sphere  at  every  point.  One  way  to  achieve  this  that  is  convenient  for  our  problem  is 
through  the  selection  of  an  arbitrary  unit  vector  s.  Given  s,  at  each  point  r  of  the 
sphere,  the  vector  i|^  defines  a  direction  at  the  tangent  plane.  As  s  varies  along  half  a 
great  circle,  takes  on  every  possible  orientation  in  the  tangent  plane  at  every  point  r 
with  the  exception  of  the  set  of  points  r  lying  on  the  great  circle  of  s,  which  is  of  measure 
zero.  To  facilitate  the  analysis,  we  choose  s  perpendicular  to 

As  shown  in  Figure  5,  let  We  be  parallel  to  the  x  axis  and  let  s  be  the  set  of  all  the  unit 
vectors  in  the  yz  plane  with  s  =  (0,  sin  x,  cos  x)  and  x  in  the  interval  [0 . . .  tt].  The  flow 
directions  n  at  every  point  are  defined  as  n  =  |||^.  This  parameterization,  however,  does 
not  treat  all  orientations  equally  (as  s  varies  along  a  great  circle  with  constant  speed,  s  x  r 
accelerates  and  decelerates).  Thus,  in  order  to  obtain  a  uniform  distribution  we  must 
perform  some  normalization.  Luckily,  however,  this  normalization  does  not  complicate 
matters  in  the  following  proof  because,  due  to  symmetry,  its  behavior  with  regard  to 
monotonicity  is  the  same  as  the  of  one  of  the  volumes  of  negative  depth  in  space  for  the 
functions  considered. 


s 


Figure  5:  Parameterization  used  in  the 
X€[0...7r],n  =  if|fi. 


analysis:  =  A(1,0, 0),  s  =  (0,  sinx,cosx)  with 
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We  assume  a  uniform  distribution  for  the  directions  n.  Thus,  in  order  to  obtain  the 
negative  range  volume  Ki,  we  have  to  integrate  the  individual  volumes  in  each  direction 
over  all  directions.  If  V’  €  [0,5r]  provides  a  uniform  parameterization  for  n,  as  given  in 
Appendix  A,  V (V’)  is  the  volume  for  a  single  direction  n(^),  and  x  is  the  parameterization 
for  n  as  defined  above,  the  following  transformation  applies: 

K.=  =  '^Wx)) 

Jo  Jg-^(o) 


dg{x) 


dx 


dx 


where  V’  =  5'(x)-  For  this  parameterization  the  normalization  term  is 


dgix) 

sin  (py 

dx 

cos(9Pj,)2  cos(x  -  ‘Pxy  -  1 

(8) 


where  ^py  is  the  angle  between  vector  r  and  the  yz  plane,  and  ipx  is  the  angle  between  the 
projection  of  r  on  the  yz  plane  and  some  fiducial  direction  in  the  yz  plane.  A  derivation 
is  given  in  Appendix  A. 


z 


Figure  6:  Parameterization  of  r:  ipy  is  the  angle  between  r  and  the  yz  plane;  ipx  is  the 
angle  between  the  projection  r  on  the  yz  plane  and  some  direction  in  the  yz  plane. 


Our  focus  is  on  the  points  in  space  with  estimated  negative  range  values  |R|.  Since 
n  =  and  s  •  a?e  =  0,  we  obtain  from  equation  (5),  by  setting  /  =  1, 


R 


=  IRI 


X  s) 


(t  X  s)  •  r  -  |R|  (we  •  r)  (s  •  r) 
12 


<  0 


(9) 


From  this  inequality  the  following  constraint  on  |R|  can  be  derived: 

sgn(t  X  s)  •  r  =  -sgn  ((t  x  s)  •  r  -  |R|(u;e  •  r)(s  •  r))  (10) 

At  any  point  r  in  the  image  this  constraint  is  either  satisfied  for  all  values  |R|,  or  it 
is  satisfied  for  an  interval  of  values  |R|  bounded  from  either  above  or  below,  or  it  is  not 
satisfied  for  any  value  at  all.  Thus,  inequality  (9)  provides  a  classification  for  the  points 
on  the  sphere,  and  we  obtain  four  different  kinds  of  areas  (types  I-IV).  The  locations  of 
these  areas  axe  defined  by  the  signs  of  the  functions  (t  x  s)  ■  r,  (t  x  s)  ■  r  and  (we  •  r)(s  •  r), 
as  summarized  in  Table  1. 


Table  1: 


area 

location 

constraint  on  |R| 

I 

sgn(t  X  s)  •  r  =  sgn(t  x  s)  •  r  =  sgn(r  •  aj£)(r  •  s) 

II 

— sgn(t  X  s)  •  r  =  sgn(t  x  s)  •  r  =  sgn(r  •  a>e)(r  •  s) 

all  |R| 

III 

sgn(t  X  s)  •  r  =  — sgn(t  x  s)  •  r  =  sgn(r  •  a;£)(r  •  s) 

IRI  - 

(r-a;£)(r-s) 

IV 

sgn(t  X  s)  •  r  =  sgn(t  x  s)  •  r  =  -sgn(r  •  C4?e)(r  •  s) 

none 

Thus  for  any  direction  n  defined  by  a  certain  s,  we  obtain  a  volume  of  negative 
range  values  consisting  of  the  volumes  above  areas  I,  II,  and  III.  An  illustration  for  both 
hemispheres  is  given  in  Figure  7.  As  can  be  seen,  areas  II  and  III  cover  the  same  amount 
of  area,  which  has  the  size  of  the  area  between  the  two  great  circles  (t  x  s)  •  r  =  0  and 
(t  X  s)  •  r  =  0,  and  area  I  covers  a  hemisphere  minus  the  area  between  (t  x  s)  •  r  =  0  and 
(t  X  s)  -  r  =  0. 

If  the  scene  in  view  is  unbounded,  that  is,  |R|  €  [0 . . .  oo],  there  is  a  range  of  values 
|R|  above  any  point  r  in  areas  I  and  III  which  results  in  negative  range  estimates.  If 
we  consider  a  lower  bound  IRminl  ^  0  and  an  upper  bound  |Rmax|  7^  oo,  we  obtain  two 
additional  curves  Cmin  and  Cmax  with  =  (t  x  s)  •  r  -  IRminl  (we  •  r)(s  •  r)  =  0  and 
C'max  =  (t  X  s)  •  r  —  |Rmax|  ’  r)(s  •  t)  =  0  as  bounds  for  areas  with  negative  range  values 
(as  shown  in  Figure  7).  As  can  be  seen,  the  curves  Cmin  =  0,  Cmax  =  0,  (t  x  s)  •  r  =  0 
and  (u>£  •  r)(s  •  r)  =  0  intersect  at  the  same  point. 

In  area  I,  we  do  not  obtain  any  volume  of  negative  range  estimates  for  points  r  between 
the  curves  (w^  •  r)(s  •  r)  =  0  and  Cmax  =  0;  the  volume  for  points  r  between  Cmin  =  0 
and  Cmax  =  0  is  bounded  from  below  by  |R|  =  (and  from  above  by  |Rmax|) 

and  the  volume  for  points  r  between  Cmin  =  0  and  (t  x  s)  •  r  =  0  extends  from  IRminl 
to  |Rmax|-  In  area  III  we  do  not  obtain  any  volume  for  points  r  between  (t  x  s)  •  r  =  0 
and  Cmin  =  0.  The  volume  for  points  r  between  Cmin  =  0  and  Cmax  =  0  is  bounded 
from  above  by  |R|  =  ^.)  (and  from  below  by  |Rmin|)  a.nd  the  volume  for  points  r 

between  Cmax  =  0  and  (w^  •  r)(s  •  r)  =  0  extends  from  |Rmin|  to  |Rmax|- 

We  are  given  u;£  and  t,  and  we  are  interested  in  t,  which  minimizes  the  negative  range 
volume.  For  any  s  the  corresponding  negative  range  volume  becomes  smallest  if  t  is  on 
the  great  circle  of  t  and  s,  that  is,  (t  x  s)  •  t  =  0,  as  will  be  shown  next. 
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Figure  7:  Classification  of  image  points  according  to  constraints  on  |R|.  At  Cjoin  and 
Cmax,  |R|  is  constrained  to  be  greater  (area  I)  or  smaller  (area  III)  than  |Rmm|  or  |Rmax|- 
The  two  hemispheres  correspond  to  the  front  of  the  sphere  and  the  back  of  the  sphere, 
both  as  seen  from  the  front  of  the  sphere. 


Let  us  consider  a  t  such  that  (t  x  s)  •  t  0  (i.e.,  t  does  not  lie  on  the  great  circle 
defined  by  t  and  s)  and  let  us  change  t  such  that  (t  x  s)  •  t  =  0.  As  t  changes,  the  area 
of  type  II  on  the  sphere  becomes  an  area  of  type  IV  and  the  area  of  type  III  becomes  an 
area  of  type  I.  Thus,  the  negative  range  volume  obtained  consists  only  of  range  values 
above  areas  of  type  I. 

Let  us  use  the  following  notation.  Am-i  denotes  the  area  which  changes  from  type 
III  to  type  I  and  Vm  and  L/(///)  are  the  volumes  before  and  after  change.  Similarly, 
Aii-iv  denotes  the  area  which  changes  from  type  II  to  type  IV  and  V//  and  Viv  are  the 
corresponding  volumes. 

The  change  of  t  does  not  have  any  effect  on  the  volumes  above  the  areas  that  did 
not  change  in  type,  as  can  be  seen  from  the  constraint  on  |R|  in  Table  1.  However, 
the  change  of  t  causes  a  decrease  in  the  volume  above  the  areas  which  changed  in  type: 
Volume  <  V//.  Furthermore,  as  can  be  seen  from  equation  (8),  the  normalization 

term  is  the  same  for  points  and  V2{<^x2-,^y2)  symmetric  with  respect  to  the 

great  circle  s  •  r  =  0,  because  and  tpxi  +  ‘px2  =  with  A:  G  N.  Thus  we 

encounter  the  same  normalization  factors  in  areas  Aju-i  and  An-iv- 

The  volume  of  negative  range  values  for  any  s  is  smallest  for  (t  x  s)  -t  =  0,  independent 
of  the  range  of  values  in  which  the  scene  lies.  If  we  assume  an  upper  bound  |Rmax|  ^  oo, 
or  a  lower  bound  [Rminl  ^  0,  or  both  bounds  on  the  scene  in  view,  there  exist  points 
r  in  areas  I  and  III  above  which  there  are  no  range  values  which  contribute  to  the 
negative  range  volume.  However  as  shown  before,  since  the  curves  Cmin  =  0,  Cmax  =  0, 
(a?e  •  r)(s  •  r)  =  0  and  (t  x  s)  •  r  =  0  intersect  at  the  same  point,  V//  must  always  be  larger 
than  Vj(7/7). 

For  any  s  the  smallest  volume  is  obtained  for  s,  t,  and  t  lying  on  a  great  circle. 
Therefore,  in  order  to  minimize  the  total  negative  range  volume  14?  we  must  have  t  =  t. 

Thus,  in  summary,  we  have  shown  that  for  any  given  rotational  error  We  the  negative 
range  volume  is  smallest  if  the  direction  of  the  actual  translation  and  the  estimated 
translation  coincide,  that  is,  t  =  t. 

3.2  Fixed  translational  error 

In  this  section  we  choose  the  following  parameterization:  The  unit  vectors  t  and  t  lie 
in  the  yz  plane,  and  (t  •  t)  >  0;  s  lies  in  the  xz  plane  with  s  =  (sin  x,  0,  cos  x)  and 
X  €  [0,  tt].  As  before,  n  =  The  normalization  term  necessary  for  the  uniformity 

of  the  orientations  n  in  this  parameterization  can  be  obtained  from  equation  (8)  by 
substituting  (py  for  px  and  (pz  for  that  is. 


dgix) 

sin  (pz 

d{x) 

cos(pzy  cos(x  -  Py^  -  1 

where  pz  is  the  angle  between  r  and  the  xz  plane  and  py  is  the  angle  between  the 
projection  of  r  on  the  xz  plane  and  some  fiducial  direction  in  the  xz  plane. 

We  are  given  t  and  t,  and  we  are  interested  in  the  direction  of  minimizing  the 
negative  range  volume.  In  analog  to  Section  3.1,  we  study  the  different  areas  on  the  sphere 
with  unbounded  and  bounded  intervals  of  estimated  negative  range  values.  Requiring 
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that 


R|  be  negative  we  obtain  the  inequality 

X  s) 


R 


=  R 


(t  X  s)  •  r  -  1R|  ((we  •  r)(s  •  r)  -  •  r)) 


<0 


(11) 


and  thus  the  three  curves  (t  x  s)  •  r  =  0,  (t  x  s)  •  r  =  0  and  g  =  (u;^  ■  r)(s  •  r)  —  (wg  •  s)  =  0 
separating  areas  I  to  IV.  ^  The  classification  is  analogous  to  the  one  in  Table  1,  except 
that  the  term  (a;^  •  r)(s  •  r)  must  be  replaced  by  the  term  (ojg  •  r)(s  •  r)  —  (wg  •  s).  Figure  8 
provides  a  pictorial  description  of  this  classification  for  the  case  of  s  outside  the  interval 
[t,t],  that  is,  sgn(t  x  s)  =  sgn(t  x  s),  and  the  case  of  s  between  t  and  t,  that  is, 
sgn(t  X  s)  =  — sgn(t  x  s).  If  sgn(t  x  s)  =  sgn(t  x  s)  the  sphere  is  separated  into  areas  I 
and  IV,  each  covering  an  area  the  size  of  a  hemisphere;  otherwise,  the  sphere  is  separated 
into  areas  II  and  III,  again  each  the  size  of  a  hemisphere. 

Again,  if  we  consider  a  lower  bound  |R,nin|  and  an  upper  bound  |Rmax|  for  the  scene 
in  view,  we  obtain  the  two  curves  Cmin  =  (t  x  s)  •  r  —  iRminl  ( •  r)(s  •  r)  —  Wg  •  s))  =  0  and 
C'max  =  (t  X  s)-r  —  |Rmax|  ((^c '  r)(s  •  f)  —  Wg  •  s))  =  0,  as  shown  in  Figure  8.  Cmin  =  0  and 
Czaax  =  0  separate  the  points  r  in  areas  I  and  III  into  those  with  no  volume  of  negative 
range  values,  those  with  a  volume  bounded  by  a  value  different  from  |Rmin|  and  |Rmax|, 
and  those  with  a  volume  ranging  from  |Rm!n|  to  |Rmax|- 

The  proof  is  given  in  three  parts.  We  decompose  u>c  into  a  component  UJpar  which 
lies  in  the  xz  plane  and  a  component  Wperp  =  A(0,  — 1,0)  parallel  to  the  y  axis:  Wg  = 
ojpax  +  Wperp.  First,  we  show  that  if  Wpar  =  0,  the  smallest  negative  depth  volume  is 
obtained  for  ojperp  =  0.  Second,  we  show  that  if  Wperp  =  0,  a  vector  Wpar  ^  0  with 
(t  X  Wpar)  =  —  (t  X  Wpar)  is  obtained,  which  we  call  u^par^,  that  provides  the  smallest 
negative  range  volume.  Third,  we  prove  that  Wg,  in  order  to  minimize  the  negative  range 
volume,  must  satisfy  the  constraint  (Wg  -t)  =  (Wg  -t).  However,  if  we  change  the  direction 
of  Wg,  which  amounts  to  with  tfperp  =  A(0,  —1, 0),  by  continuously  increasing 

A  >  0,  the  negative  depth  volume  increases  monotonically,  and  thus  the  smallest  negative 
depth  volume  is  obtained  for  Wpar,,  •  The  details  of  the  proofs  will  now  be  given. 


Part  1  (wperp  minimizing  the  negative  range  volume) 

Let  ojpar  =  0;  then  g  =  (wperp  •  r)(s  •  r)  =  0,  and  the  curves  C,-  =  0  for  i  =  {min,  max} 
become 

Ci  =  (t  X  s)  •  r  -  |Ri|  (cUperp  •  r)(s  •  r)  =  0 
Since  (t  x  s)  x  Wperp  =  0, 


Ci  —  (^perp  ■ 


'sin  Z(t,  s)|t| 


-  (s  -  r)  =  0 


peipl 


where  sinZ(t,  s)  denotes  the  sine  of  the  angle  between  vectors  t  and  s.  Curve  Ci  =  0 
thus  consists  of  the  great  circle  ujperp  •  r  =  0  and  the  circle  “  (s  •  r)  =  0  parallel 


^Curve  ^  =  0  is  of  the  same  form  as  the  zero-motion  contour  defined  in  [13],  which  got  its  name  from 
the  fact  that  it  defines  the  locus  of  points  for  which  a  flow  field  due  to  rigid  motion  with  translation  s 
and  rotation  w  can  take  on  the  value  zero. 
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(c)  (d) 


Figure  8:  Classification  of  image  points  for  general  (a),  (b):  sgn(t  xs)-r  =  sgn(t  xs)-r. 
(c),  (d):  sgn(t  x  s)  •  r  =  — sgn(t  x  s)  •  r.  If  sgn(t  x  s)  =  sgn(t  x  s)  the  negative  estimated 
range  values  are  in  area  I  above  the  area  defined  by  curves  Cmax  —  0  and  (t  x  s)  •  r  =  0. 
If  sgn(t  X  s)  =  — sgn(t  x  s)  the  negative  estimated  range  values  are  above  area  II  and  in 
area  III  above  the  area  defined  by  curves  Cmin  =  0  and  (Wf  •  r)(s  •  r)  —  {u:^  •  s)  =  0. 


to  the  great  circle  (s  •  r)  =  0.  If  >  1  this  circle  disappears.  Figure  9  provides  a 

pictorial  description  of  the  areas  I-IV  and  the  curves  Ci  =  0. 


s  s 


(c) 


(d) 


Figure  9:  Classification  of  image  points  for  We  =  Wperp- 

Let  us  consider  two  flow  directions  defined  by  vectors  Si  and  S2  that  are  symmetric 
with  regard  to  t,  that  is  (sj  x  t)  =  — (s2  x  t),  and  let  Si  be  between  t  and  t.  For  every 
point  ri  in  area  III  defined  by  Si  there  exists  a  point  r2  in  area  I  defined  by  S2  with  the 
same  normalization  factor  such  that  the  negative  estimated  ranges  above  ri  and  r2  add 
up  to  |Rmax|  —  |Rniin|-  Thus,  the  volume  of  negative  estimated  range  obtained  from  Si  and 
S2,  denoted  by  V(si)  and  V(s2)  respectively,  amounts  to  the  area  of  the  sphere  Asp  times 
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iRmaxI  —  |Rmin|,  that  is,  F(si)  +  ^(82)  =  ^sp(|Rmax|  “  |Rmin|)  (area  II  of  Si  Contributes 
a  hemisphere;  area  III  of  Si  and  area  I  of  S2  together  contribute  a  hemisphere).  Let  us 
decompose  the  negative  rajige  volume  and  provide  the  following  definition: 

V;  =  V:4  +  Vs  and  Va  =  Va>  +  Va>> 

Ya  corresponds  to  the  volume  obtained  from  all  s  with  |t  •  s|  >  |t  •  t|.  Va’  corresponds  to 
those  s  with  sgn(t  •  s)  =  — sgn(t  •  s)  (that  is,  all  s  between  t  and  t)  and  Va"  corresponds 
to  those  s  with  sgn(t  •  s)  =  sgn(t  •  s)  (that  is,  the  set  of  s  symmetric  in  t  to  the  set  of  s 
in  Va')-  Vb  corresponds  to  the  volume  from  all  the  remaining  s’s.  Va'  consists  of  range 
values  above  areas  of  type  III  and  II,  and  V4"  and  Vb  consist  of  range  values  above  areas 
of  type  I  only. 

Va  =  Va'  +  Va"  =  AspdRmaxI  -  |Rmin|)^(t,t)  and  Vb  >  0.  Vb  =  0  if  for  all  s 
contributing  to  volume  Vb  we  have  ^  I5  that  is,  if 

sinZ(t,t)|t|  ^  ^ 

|I^TOax|  l^perpl 

Thus,  in  summary,  if  a?perp  =  0,  the  minimum  negative  range  volume  Vn  is  equal  to 
^spdRmaxl  -  |Rmmi)'i^(t,t)  and  is  obtained  for  all  Wpeip  with  |a;perp|  <  K 

|Rmax|  =  00  or  (t  •  t)  =  1  there  is  only  one  solution,  Wperp  =  0. 

Part  2  (a;perp  minimizing  the  negative  range  volume) 

If  ojperp  =  0,  curve  g  =  (wpar  •  r)(s  •  r)  —  (wpar  •  s)  =  0  is  symmetric  with  respect  to  the 
xz  plane  and  curves  Ci  =  0  for  i  =  {min,  max}  become  (see  Figure  10) 

Cj  =  (t  X  S)  •  r  -  |Ri|  ((Wpax  •  r)(s  •  r)  -  (Wpar  •  s))  =  0 

Let  us  fix  s  and  |wpar|  and  let  us  vary  the  direction  As  Z(s,u;par)  increases,  the 

area  between  Cmin  =  0  and  Cmax  =  0  multiplied  by  the  normalization  factor,  and  the  area 
between  Cmm  =  0  and  (t  x  s)  •  r  =  0  multiplied  by  the  normalization  factor,  decrease.  This 
can  be  verified  by  numerical  integration.  It  can  also  be  understood  from  the  following 
observation:  Referring  to  Figure  11,  we  see  that  if  (s  •  Wpar)  7^  0,  an  increase  in  Z(s,a>pax) 
causes  an  increase  in  the  size  of  curve  gi  =  0.  Therefore,  the  area  between  Ci  =  0  and 
(t  X  s)  •  r  =  0  in  the  left  hemisphere  increases,  but  the  area  between  Ci  =  0  and  (t  x  s)  •  r 
in  the  right  hemisphere  decreases  by  a  larger  amount,  since  the  area  inside  curve  p  =  0  is 
smaller  than  the  area  inside  curve  (t  x  s)  •  r  =  0.  Furthermore,  the  normalization  factors 
in  the  area  added  in  the  left  hemisphere  are  smaller  than  the  normalization  factors  in 
the  area  lost  in  the  right  hemisphere.  Therefore,  if  sgn(t  x  s)  =  sgn(t  x  s),  the  negative 
range  volume  above  area  I  decreases,  and  if  sgn(t  x  s)  =  — sgn(t  x  s)  the  negative  range 
volume  above  area  III  increases  as  Z(s,«par)  increases. 

The  negative  range  volume  V (s)  for  each  s  can  be  decomposed  into  a  component  IZi(s) 
dependent  on  (txs)- rand  a  component  V2(s)  dependent  on  |R,|  ((a>par*r)(s-r)  — (u>par-s)). 
The  overall  negative  range  volume  14  is  obtained  by  integrating  V’(s)  over  all  s,  that  is, 
K  =  /s  V{s)  ds  =  Li(s)  ds  +  14(5)  ds.  Since  Vi(s)  does  not  depend  on  u)par,  I4  will 
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Figure  10:  Classification  of  image  points  for  =  Wpar- 


attain  its  minimum  at  the  minimum  of  V2(s)  ds,  which  is  achieved  when  cOpar  is  between 
t  and  t  at  an  equal  distance  from  both,  that  is,  if  (t  x  Wpar)  =  —  (t  x  a;par)- 

Let  us  denote  by  u^parg  the  uJpar  minimizing  the  negative  range  volume  14-  We  have 
found  the  direction  of  Wparj,;  it  remains  to  be  shown  that  |wparol  ^  0- 

For  OJparo  =  0,  K  =  ^spdRonaxI  —  |Ranin|)^(tt).  Vn  =  Va  +  Vb  =  Va'  +  Va"  +  Vfi. 
Since  for  any  Si  with  sgn(t  x  Si)  =  — sgn(t  x  Si),  and  any  S2  with  sgn(t  x  S2)  =  sgn(t  x 
S2),  L'(si)  +  V^(S2)  <  AgpdRmaxI  “  |Rmin|),  it  follows  that  volume  Va  =  Va'  +  Va"  < 
^spdRmaxI  -  |Rnim|)(^(tt)).  Volume  Vs  >  0.  If  (t  •  t)  =  0  (that  is,  Z(tt)  =  7r/2),  volume 
Vb  =  0,  and  thus  there  exists  a  Wpar^  ^  0  minimizing  the  negative  range  volume.  Since 
due  to  the  symmetry  on  the  sphere,  Iwpargl  must  change  monotonically  ais  (t*t)  increases, 
we  conclude  that  ajpar^  7^  0  for  all  (t  •  t)  >  0. 

Part  3  (We  =  Wpar  +  Wperp) 

Let  us  consider  =  a;par  +  Wperp,  with  ujpax  a  component  in  the  xz  plane  and  ojperp  = 
A(0,  — 1,0)  parallel  to  the  y  axis.  Therefore, 

9  =  (Wperp  •  r)(s  •  r)  +  (Wpar  ’  r)(s  •  r)  -  (ujpar  •  s)  =  0 


and 


Ci  =  (t  X  s)  •  r  -  |Rj|  ((Wperp  •  r)  (s  •  r)  +  (Wpar  •  r)  (s  •  r)  -  (Wpar  •  s))  =  0 

Let  us  fix  \<^i\  =  1  and  change  We  by  increasing  |ajpeip|  and  thus  increasing  A.  If 
(wpar  •  s)  >  0  an  increase  in  A  causes  a  decrease  in  the  area  between  Cmin  =  0  and 
Cmax  =  0  times  the  normalization  factor,  and  a  decrease  in  the  area  between  Cmin  =  0 
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Figure  11:  As  Wpar  changes  from  u>\  to  W2  and  ^(wpar,  s)  increases,  curve  fir  =  0  changes 
fronn  firi  to  fi'2  and  curve  Ci  =  0  changes  from  C\  to  C2-  In  areas  of  type  I  the  volume 
between  Ci  =  0  and  ^  =  (t  x  s)  •  r  =  0  decreases  and  in  areas  of  type  III  the  volume 
between  Ci  =  0  and  fi'  =  0  increases. 
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and  (t  X  s)  •  r  =  0  times  the  normalization  factor.  Again,  this  can  be  verified  either  by 
numerical  integration  or  through  the  following  observation  in  reference  to  Figure  12:  An 
increaise  in  A  causes  an  increase  in  the  size  of  5'  =  0.  Since  5'  =  0  is  not  symmetric  about 
the  xz  plane,  the  area  between  Ci  =  0  and  (t  x  s)  •  r  =  0  in  the  left  hemisphere  times 
the  normalization  factor  increases,  but  the  area  between  Ci  =  0  and  (t  x  s)  •  r  =  0  in  the 
right  hemisphere  times  the  normalization  factor  decreases  by  a  larger  amount. 

Thus,  an  increase  in  A  has  the  effect  that  for  all  s  with  sgn(t  x  s)  =  — sgn(t  x  s)  the 
negative  range  volume  above  area  III  increases,  and  for  all  s  with  sgn(t  x  s  )  =  sgn(t  X  s) 
the  volume  above  area  I  decreases.  It  can  further  be  observed  that  the  larger  (wpar's),  the 
larger  the  decrease  in  the  volume  above  area  I  (or  the  larger  the  increase  in  the  volume 
above  area  III),  with  a  peak  for  (wpar  •  s)  =  1. 

Using  the  same  argument  as  before,  that  the  negative  range  volume  K  can  be  de¬ 
composed  into  a  component  due  to  (t  x  s)  •  r  only  and  a  component  due  to  |Rj|  (ct>£  •  r)(s  • 
r)  —  (Wf  •  s)  only,  we  find  that  the  volume  above  areas  of  type  III  must  be  as  small  as 
possible,  and  thus  (a;^  •  t)  =  •  t).  Therefore  can  only  be  of  the  form 

We  =  A  (oJpgxQ  “t"  Wperp^ 

The  smallest  negative  range  volume  is  smaller  for  Wpar,  than  for  any  Wperp- 

Again  we  decompose  T4:  Ki  =  V4  +  V5  =  Va>  +  V4"  +  Vg.  If  (t  - 1)  =  0,  =  0, 

and  since  an  increase  in  A  causes  an  increase  in  Va’  which  is  larger  than  the  decrease 
in  Va",  the  negative  range  volume  must  increase  monotonically  and  |c<;£|  must  decrease 
monotonically.  Therefore,  if  (t  •  t)  >  0  and  (t  •  t)  ^  1  the  smallest  negative  range  volume 
must  also  increase  monotonically  and  the  |aj£|  minimizing  the  volume  must  decrease 
monotonically. 

Thus  in  summary  we  have  shown  that  for  a  given  t  and  t,  the  rotational  error 
which  minimizes  the  negative  range  volume  is  Wpar^  7^  0.  The  direction  of  Wpar^  is  such 
that  (Wpar^  X  t)  =  (Wpai-jj  X  t). 

4  The  Planar  Case 

Let  us  express  equation  (4)  in  the  more  common  component  notation:  r  =  (ri,r2,’^3)- 
rs  is  zero.  If  we  denote  ri  by  u  and  r2  by  v  and  express  the  coordinates  of  the  focus  of 
expansion  as  (xo,2/o)  =  we  obtain  the  well-known  equations 

u  =  !|‘  +  u„.  =  (i-x„)f +  af -/?(f +  /)+7!/ 

f  =  ^  +  Xpoi  =  (9  -  !/o)f  +  O'  (^  +  /)  -  /37  -  '1'^ 

Since,  due  to  the  scaling  ambiguity,  only  the  direction  of  translation  can  possibly  be 
obtained,  we  set  W  =  1  and  obtain  from  equation  (7) 

_ (x  -  Xq)  n^c  +  jy  -  yo)  riy _ 

(x  -  xo)  nx  +  {y-  yo)  ny  +  z(^  -  A  (y  +  /)  +  'ley) 

+  (a.  if  +  f)-  -  1^^)  ^y) 
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Figure  12:  As  changes  from  to  Wa  =  ^  curve  g  =  Q  changes 

from  g\  to  g2  and  curve  Ci  =  0  changes  from  Ci  to  C2,  the  volume  of  negative  range 
between  Ci  —  0  and  =  (t  x  s)  •  r  =  0  decreases  above  area  I,  and  the  volume  of  negative 
range  values  between  Ci  =  0  and  5'  =  0  increases  above  area  III. 
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where  n^;  and  Uy  denote  the  components  of  n  in  the  x  and  y  directions. 

In  the  following  analysis,  we  assume  that  the  FOE  and  the  estimated  FOE  are  within 
the  image.  We  do  not  study  any  particular  image  shape,  and  we  ignore  the  exact  effects 
resulting  from  volumes  of  negative  depth  in  different  directions  being  outside  the  field 
of  view.  Such  effects  might  introduce  biases,  but  this  is  of  little  practical  interest.  Any 
implementation  of  algorithms  based  on  the  constraint  of  negative  depth  must  consider 
the  position  of  the  FOE  on  the  image,  and  cannot  just  be  based  on  blindly  counting  the 
negative  values.  We  also  perform  some  simplification:  For  a  limited  field  of  view,  the 
terms  quadratic  in  the  image  coordinates,  which  appear  in  the  rotational  components, 
are  small  with  respect  to  the  linear  and  constant  terms,  and  we  therefore  drop  them. 

The  0  distortion  surface  thus  becomes 

(x  -  xo)  Wr  +  (y  -  yo)  riy  =0  (13) 


and  the  —  oo  distortion  surface  takes  the  form 

(x  -  xo)n^  +  iy-  yo)ny  +  Z  {{-M  +  TeS/)  +  ("c/  -  Icx)  Uy)  =  0  (14) 

The  flow  directions  {n^.,ny)  can  alternatively  be  written  as  (cos  V”, sin V’),  with  V’  €  [0,7r] 
denoting  the  angle  between  [nx,nyY  and  the  x  axis. 

To  simplify  the  visualization  of  the  volumes  of  negative  depth  in  different  directions, 
we  perform  the  following  coordinate  transformation  to  align  the  flow  direction  with  the 
X  axis:  for  every  ‘ip  we  rotate  the  coordinate  system  by  angle  tp,  to  obtain  the  new 
coordinates  ^  ^ 

[x',  y'Y  =  R[x,  yY,  [xo,  yo  Y  =  yoY 

[^07^0]^  =  R[^^yY^  W,i3eY  =  R[<^c,^cY 


where  R  = 


cos  xp  sin  xp 
—  sin  xp  cos  xp 
Equations  (13)  and  (14)  thus  become 


(x'  —  x'q)  =  0 

and  (x'  —  xo')  +  Z  {—0ef  +  ItV)  =  0 

In  the  following  proof  we  first  consider  the  case  of  7£  =  0  and  we  then  study  the 
general  case. 


Part  1  (7£  =  0) 

If  7^  =  0,  the  volume  of  negative  depth  values  for  every  direction  xp  lies  between  the 

Q1 1 

(x'-x')  =  0  and  {x'-xo')-/3,'fZ  =  0 

Equation  (x'  -  Xq)  =  0  describes  a  plane  parallel  to  the  y'Z  plane  at  distance  x'q  from 
the  origin,  and  equation  (x'  -  Xo')  -  fZi  =  0  describes  a  plane  parallel  to  the  y'  axis 
of  slope  which  intersects  the  x'y'  plane  in  the  x'  coordinate  xq.  Thus  we  obtain  a 
wedge-shaped  volume  parallel  to  the  y'  axis.  Figure  13  illustrates  the  volume  through  a 
slice  parallel  to  the  x'Z  plane  and  Figure  14  gives  an  illustration  of  this  volume  in  space. 
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Figure  13:  Slice  parallel  to  the  x'Z  plane  through  the  volume  of  negative  estimated  depth 
for  a  single  direction. 


The  scene  in  view  extends  between  the  depth  values  Z^ain  and  Zmax-  The  — oo  dis¬ 
tortion  surface  intersects  the  planes  Z  =  Z^aax  and  Z  =  Zmin  in  the  x'  coordinates 
^0  +  fZraax  and  xq'  +  0/  f  Zrr,\-n .  As  can  be  seen  from  Figure  13,  whether  we  are 
given  the  rotational  error  0^  or  the  translations  Xq  and  x'q,  and  thus  the  translational 
error  XqJ  =  cos^xq^  +  sin^j/o^  =  xq  —  Xq,  the  minimum  negative  volume  is  obtained  if 
x'o  =  Xq  -f-  jn  other  words,  the  0  distortion  surface  has  to  intersect  the 

— oo  distortion  surface  in  the  middle  of  the  depth  interval  in  the  plane  Z  = 

Thus,  for  the  direction  defined  by  any  angle  V’,  the  smallest  volume  of  negative  depth 
estimates  is  obtained  if  the  rotational  and  translational  errors  are  related  as  follows: 


A'  = 


f  (^max  d"  -^min) 


Since  0^'  =  cos^^e  —  sin  ipac  and  xq^  =  cosipxo^  -f-  sin^t/Oej  l-^e  volume  is  minimized 
for  every  direction  if  ^  In  other  words,  the  rotational  error  (ae,A)  and  the 

translational  error  (xq^,  yoj  have  to  be  perpendicular  to  each  other. 


Part  2  (7,  ^  0) 

If  7£  ^  0,  the  —  oo  distortion  surface  becomes 

{x'-x',)  +  Z{-0/f  +  j,y')  =  O 

This  surface  can  be  most  easily  understood  by  slicing  it  with  planes  parallel  to  the 
x'y'  plane.  At  every  depth  value  Z,  we  obtain  a  line  of  slope  which  intersects  the 
x'  axis  in  x'  =  xq'  +  0c  fZ  (see  Figure  15).  For  any  given  Z  the  slopes  of  the  lines  in 


25 


Figure  14:  7^  =  0:  The  volume  of  negative  depth  values  for  a  single  direction  between 
the  0  and  —00  distortion  surfaces. 

different  directions  are  the  same.  An  illustration  of  the  volume  of  negative  depth  values 
is  given  in  Figure  16. 

In  part  1  of  this  analysis  we  found  that  if  7£  =  0,  the  smallest  volume  of  negative 
depth  values  is  obtained  if  x'q  =  xq  +  and  the  intersection  of  the  0  and 

—00  distortion  surfaces  is  at  Z  =  order  to  derive  the  position  of  x'q  that 

minimizes  the  negative  depth  values  for  the  general  case  of  7^  ^  0,  we  study  the  change 
of  volume  as  Xq  changes  from  xq  + 

Referring  to  Figure  17,  it  can  be  seen  that  for  any  depth  value  Z,  a  change  in  the 
position  of  x'q  to  Xo  +  d,  assuming  /  0,  causes  the  corresponding  area  of  negative  depth 
values  to  change  by  where 

=  -  {y'l  +  y'2)dsgii(j,) 

and  y'l  and  y'2  denote  the  y'  coordinates  of  the  intersection  point  of  the  —00  distortion 
contour  at  depth  Z  with  the  0  distortion  contours  x'  =  x'q  and  x'  —  x'q  d. 

By  intersecting  the  —00  distortion  contour  x'  —  xo'  +  Z{—^/f  +  'yy)  =  0  with  the  0 
distortion  contours  x'-(xo'+^(2’min+^inax))  =  0  and  x'-(xo'+^(.^nun+-^max)  +  d)  =  0, 
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Figure  15:  Slices  parallel  to  the  x'y'  plane  through  the  0  distortion  surface  (Co)  and  the 
— oo  distortion  surfaces  at  depth  values  Z  =  (C*!),  Z  =  —ph  {C2),  and  Z  =  Zma.^ 

(ft). 


we  obtain 


and  therefore 


nJ  _  _ i 

-  2Z7.  7* 

and  t/' = ^  ^ 


Zl€  7e 


A,  =  -sgn  (7,)  d  +  d^J 

The  change  in  negative  depth  volume  for  any  direction  Vc  is  thus  given  by 

l-Xo'+^e'Zmix 


rXQ  -hpc  ^max 

l^e  =  sgn(/?/)  /  A 

Jxo'+0('Zrr.i^ 


which  amounts  to 


K  =  sgn  (AO  sgn  (t.jd  -  Z^)  -  ‘“(f^)) 

It  can  be  verified  that  in  order  for  Vc  to  be  negative,  sgn(A')  =  — sgn(d).  This  means 
that  xo'  +  d  lies  between  Xq  and  xq  (see  Figure  17). 
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Figure  16:  7^  7^  0:  volume  of  negative  depth  values  between  the  0  and  —00  distortion 
surfaces. 


We  are  interested  in  the  d  which  minimizes  Vc-  By  solving 


we  obtain 


In  (f^) 

\  ^min  / 


+  Zjaax) 


The  change,  Zc-,  in  the  Z  coordinate  of  the  intersection  between  the  0  and  —00  distortion 
surface  is  Zc  =  -p  and  thus  the  intersection  for  the  smallest  negative  depth  volume  is  at 


Zm  — 


Since  Zm  is  the  same  for  all  flow  directions,  the  total  negative  depth  volume  is  obtained 
if  the  volume  in  every  direction  is  minimized.  Therefore  we  have  the  constraint 


yoe 


(16) 
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Figure  17:  A  change  of  x'q  to  ^q  +  c?  causes  the  area  of  negative  depth  values  Ac  to  increase 
by  area  Ax  and  to  decrease  by  area  A2.  This  change  amounts  to  Ac  —  — (y'i+y'2)^sgn(7c). 


For  a  given  rotational  error  (ae,/3£,7e),  equation  (16)  defines  the  direction  of  the  FOE 
of  the  translational  error  on  the  image  plane.  For  a  given  translational  error  {xQ^,yo^) 
equation  (16)  defines  the  direction  of  the  AOR  of  the  rotational  error  on  the  image.  In 
addition  we  must  have  7£  =  0. 

Some  comment  on  the  finiteness  of  the  image  is  necessary  here.  The  values  Ac  and  Vc 
have  been  derived  for  an  infinitely  large  image.  If  7£  is  very  small  or  some  of  the  depth 
values  Z  in  the  interval  [Zmm?  ^max]  are  small,  the  coordinates  of  the  intersections  y\  and 
y'2  do  not  lie  within  the  image.  The  value  of  Ac  can  be  at  most  the  length  of  the  image 
times  d.  Since  the  slope  of  the  —00  distortion  contour  for  a  given  Z  is  the  same  for  all 
directions,  this  will  have  very  little  effect  on  the  relationship  between  the  translational 
and  rotational  motion  errors.  It  has  an  effect,  however,  on  the  value  Zm- 

Assuming  the  intersections  are  within  the  image,  we  can  also  derive  the  relative  values 
of  the  motion  errors:  The  amount  of  error  depends  on  the  interval  of  depth  values  of  the 
scene  in  view.  Since  for  every  direction 

K  =  ^0'  +  /3/fZ^ 
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we  obtain  by  substitution  from  equation  (15) 


and 


xo,  =  -A/ 

yo,  =  ocj  ( 


5  Shape  Estimation  in  the  Presence  of  Distortion 

The  above  results  axe  of  great  importance  for  the  analysis  of  shape  estimation.  An  error 
of  the  form  ^  guarantees  that  for  the  image  near  the  fixation  center,  a  shape 

map  of  the  scene  is  derived  which  is  very  well  behaved. 

Near  the  image  center  the  image  coordinates  are  very  small.  Thus  using  equation  (12) 
the  distortion  factor  there  can  be  approximated  by 

^  _ _ xqUo;  +  ypriy _ 

xon^  +  yoTly  +  Zf  —  OCeny) 

If  ^  z=  — for  any  given  Z,  the  numerator  is  a  multiple  of  the  denominator  and  thus  the 
distortion  factor  is  the  same  for  every  direction  (n^,,  Uy).  This  means  that  scene  points  of 
the  same  depth  are  distorted  by  the  same  factor  and  the  computed  depth  map  has  the 
same  level  contours  as  the  actual  depth  map  of  the  scene. 

Depending  on  the  sign  of  the  rotational  error,  there  will  either  be  an  overestimation 
for  the  nearby  scene  and  an  underestimation  for  the  far  scene  or  vice  versa.  All  the 
distortion,  however,  takes  place  only  in  the  Z  dimension.  Thus  the  resulting  depth 
function  involves  an  affine  transformation.  The  invariants  of  these  shape  maps  have  been 
studied  in  the  work  of  Koenderink  and  van  Doom  [22,  23]. 


6  Conclusions 

An  algorithm-independent  stability  analysis  of  structure  from  motion  has  been  presented. 
The  analysis  did  not  make  any  assumptions  about  the  scene,  and  was  based  solely  on 
the  fact  that  the  depth  of  the  scene — in  order  for  the  scene  to  be  visible — has  to  be 
positive.  As  input  to  the  structure  from  motion  process  we  did  not  consider  optic  flow 
or  correspondence,  but  the  value  of  the  fiow  at  every  point  along  some  direction,  a 
quantity  more  easily  computable.  Our  stability  analysis  amounts  to  an  understanding  of 
the  coupling  of  the  translational  and  rotational  error.  Given  an  error  in  the  translation 
(or  the  rotation),  we  asked:  what  is  the  value  of  the  rotation  (or  the  translation)  that 
estimates  the  minimum  number  of  negative  depth  values?  We  performed  the  analysis  for 
both  a  spherical  and  a  planar  retina.  For  the  case  of  a  planar  retina  we  found  that  the 
configuration  of  the  rotational  and  translational  errors  resulting  in  minimum  negative 
depth  is  the  one  in  which  the  projections  of  the  two  error  vectors  on  the  image  plane 
are  perpendicular  to  each  other.  For  the  case  of  a  spherical  retina,  we  found  that  given 
a  rotational  error,  the  optimal  translation  is  the  correct  one,  while  given  an  error  in 
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translation,  the  optimal  rotation  error  is  perpendicular  to  the  translational  error  at  an 
equal  distance  from  the  real  and  estimated  translations. 

These  results,  besides  their  potential  use  in  structure  from  motion  algorithms,  also 
represent  a  computational  analysis  comparing  different  eye  constructions  in  the  natural 
world.  The  results  on  the  sphere  demonstrate  that  it  is  very  easy  for  a  system  with 
panoramic  vision  to  estimate  its  self-motion.  Indeed,  if  the  system  possesses  an  inertial 
sensor  providing  its  rotation  with  some  error,  we  have  shown  that  after  derotation,  a 
simple  algorithm  considering  only  translation  based  on  normal  flow  will  estimate  the 
translation  optimally.  This  suggests  that  spherical  eye  design  is  optimal  for  flying  systems 
such  as  the  compound  eyes  of  insects  and  the  panoramic  vision  of  birds. 

The  analysis  on  the  plane  revealed  that  for  an  optimal  configuration  of  errors,  the 
estimated  depth  distorts  only  in  the  2  direction,  with  the  level  contours  of  the  depth 
function  distorting  by  the  same  amount,  thus  maJsing  it  feasible  to  extract  meaningful 
shape  representations.  This  suggests  that  the  camera-type  eyes  of  primates  are  possibly 
optimal  for  systems  that  need  good  shape  computation  capabilities. 
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Appendix  A  Re-parameterization  of  Flow  Directions 

Let  us  choose  a  uniformly  distributed  flow  fleld  direction  ni(V')  as  follows.  The  coordi¬ 
nates  of  r  =  [x,y,zY  at  every  point  on  the  unit  sphere  are  obtained  through  a  rotation 
of  point  [0, 0, 1]^  by  an  angle  (px  around  the  x  axis  followed  by  a  rotation  of  angle  py 
around  the  y  axis.  Thus  the  rotation  matrix  R  is  given  by 


R  = 


cos  p>y  0  sin  (fiy 

—  sin  ifx  sin  (py  cos  (px  sin  (px  cos  (py 

—  cos  (px  sin  cpy  —  sin  (px  cos  <px  cos  py 


and  every  point  r  =  [sin  py,  sin  px  cos  Py,  cos  px  cos  PyY ■ 

Vectors  ni(V’)  are  obtained  through  rotation  of  unit  vector  [sin^,cosV’,0]^  at  point 
[0,0,1]^.  Thus 


ni(^)  = 

[cos  Py  sin  —  sin  px  sin  Py  sin  xf}  -|-  cos  px  cos  xj},  —  cos  px  sin  py  sin  xp  —  sin  px  cos  V’]^ 

On  the  other  hand,  the  direction  n2(x)  used  in  the  analysis  in  Section  3.1  is  chosen  to 
be  n2(x)  =  r  X  s(x)  with  s  =  [0,  sin  x,  cos  x]^- 

Thus  n2(x)  =  [cos  px  cos  Py  sin  x  —  sin  px  cos  py  cos  x,  sin  py  cos  x,  —  sin  py  sin  x]^-  In 
order  for  ni(^)  to  be  parallel  to  n2(x)  fhe  following  must  hold: 

(ni(^)  X  n2(x))  •  r  =  0 

Thus  xp  =  g{x)  =  3’rctan(^^^^^)  and  the  normalization  factor  ||^|  is 


dxp 

sm(py) 

dx 

cosipy)^  cos(x  -  PxY  -  1 

For  an  illustration  see  Figure  6. 
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specific  assumptions  about  the  scene  are  made.  The  input  used  is  the  value  of  the  flow  along  some  direction,  which 
is  more  general  than  optic  flow  or  correspondence.  For  a  planar  retina  it  is  shown  that  the  optimal  configuration  is 
achieved  when  the  projections  of  the  translational  and  rotational  errors  on  the  image  plane  are  perpendicular.  For 
a  spherical  retina,  given  a  rotational  error,  the  optimal  translation  is  the  correct  one,  while  given  a  translational 
error  the  optimal  rotational  error  is  normal  to  the  translational  one  at  an  equal  distance  from  the  real  and  estimated 
translations.  The  proofs,  besides  illuminating  the  confounding  of  translation  and  rotation  in  structure  from  motion, 
have  an  important  application  to  ecological  optics.  The  same  analysis  provides  a  computational  explanation  of  why 
it  is  much  easier  to  estimate  self-motion  in  the  case  of  a  spherical  retina  and  why  it  is  much  easier  to  estimate  shape 
in  the  case  of  a  planar  retina,  thus  suggesting  that  nature^s  design  of  compound  eyes  (or  panoramic  vision)  for  flying 
systems  and  camera-type  eyes  for  primates  (and  other  systems  that  perform  manipulation)  is  optimal. 
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